library(knitr)
library(tidyverse)
cocktails <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-26/cocktails.csv')
# Converting every character to lower case:
cocktails <- cocktails %>%
mutate(alcoholic = str_to_lower(alcoholic),
glass = str_to_lower(glass),
ingredient = str_to_lower(ingredient))
cocktails %>%
filter(ingredient_number == 1) %>%
count(alcoholic, category, glass, sort = TRUE)
## # A tibble: 92 x 4
## alcoholic category glass n
## <chr> <chr> <chr> <int>
## 1 alcoholic Ordinary Drink cocktail glass 90
## 2 alcoholic Ordinary Drink highball glass 51
## 3 alcoholic Ordinary Drink collins glass 50
## 4 alcoholic Ordinary Drink old-fashioned glass 43
## 5 alcoholic Shot shot glass 37
## 6 alcoholic Cocktail cocktail glass 29
## 7 non alcoholic Other/Unknown highball glass 20
## 8 alcoholic Ordinary Drink whiskey sour glass 16
## 9 alcoholic Homemade Liqueur collins glass 10
## 10 alcoholic Ordinary Drink champagne flute 10
## # ... with 82 more rows
cocktails %>%
count(alcoholic, ingredient, sort = TRUE)
## # A tibble: 403 x 3
## alcoholic ingredient n
## <chr> <chr> <int>
## 1 alcoholic vodka 87
## 2 alcoholic gin 84
## 3 alcoholic orange juice 51
## 4 alcoholic lemon juice 50
## 5 alcoholic sugar 49
## 6 alcoholic lemon 44
## 7 alcoholic light rum 42
## 8 alcoholic amaretto 39
## 9 alcoholic triple sec 38
## 10 alcoholic grenadine 34
## # ... with 393 more rows
Gin, Vodka, Orange & Lemon Juice are most common ingredients. Using these counts can do bar plot stuff but maybe later, I thought to implement each of my last semester learnings in upcoming #tidytuesday datasets.
Need to transform data into transaction type matrix which can then be directly used in arules R package. Single cocktail has as many rows as its ingredients, so keeping only single rows for each cocktail with all ingredients stored as list.
library(arules)
singles <- cocktails %>%
mutate(ingredient = factor(ingredient)) %>%
group_by(row_id) %>%
summarize(ingredlist = list(ingredient))
#Attaching name to each set of ingredient with cocktail's row_id
names(singles$ingredlist) <- singles$row_id
Some cocktails have duplicate ingredients in data, like below:
ingredient ‘food color’ is present multiple times having different values in ‘measure’)
Hence the warning.
ingreds <- as(singles$ingredlist, "transactions")
## Warning in asMethod(object): removing duplicated items in transactions
rules <- apriori(ingreds, parameter = list(support = 0.01, confidence = 0.2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 5
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[306 item(s), 546 transaction(s)] done [0.00s].
## sorting and recoding items ... [91 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [88 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
#After multiple threshold setting, choosing 0.01 and 0.2 as it returns suitable amount of rules.
summary(rules)
## set of 88 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4
## 66 18 4
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.000 2.000 2.295 2.250 4.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.01099 Min. :0.2045 Min. :0.01282 Min. : 1.362
## 1st Qu.:0.01282 1st Qu.:0.2848 1st Qu.:0.02152 1st Qu.: 2.418
## Median :0.01465 Median :0.4361 Median :0.03755 Median : 4.234
## Mean :0.01642 Mean :0.4936 Mean :0.04225 Mean : 7.331
## 3rd Qu.:0.01832 3rd Qu.:0.6875 3rd Qu.:0.05128 3rd Qu.: 7.913
## Max. :0.04212 Max. :1.0000 Max. :0.12637 Max. :34.125
## count
## Min. : 6.000
## 1st Qu.: 7.000
## Median : 8.000
## Mean : 8.966
## 3rd Qu.:10.000
## Max. :23.000
##
## mining info:
## data ntransactions support confidence
## ingreds 546 0.01 0.2
sub_rules <- inspect(head(rules, n = 10, by = "lift"))
kable(sub_rules)
| lhs | rhs | support | confidence | coverage | lift | count | ||
|---|---|---|---|---|---|---|---|---|
| [1] | {lemon juice,orange,sugar} | => | {maraschino cherry} | 0.0146520 | 1.0000000 | 0.0146520 | 34.12500 | 8 |
| [2] | {lemon juice,orange} | => | {maraschino cherry} | 0.0146520 | 0.8000000 | 0.0183150 | 27.30000 | 8 |
| [3] | {orange,sugar} | => | {maraschino cherry} | 0.0201465 | 0.7333333 | 0.0274725 | 25.02500 | 11 |
| [4] | {lemon juice,maraschino cherry} | => | {orange} | 0.0146520 | 1.0000000 | 0.0146520 | 23.73913 | 8 |
| [5] | {maraschino cherry,sugar} | => | {orange} | 0.0201465 | 1.0000000 | 0.0201465 | 23.73913 | 11 |
| [6] | {lemon juice,maraschino cherry,sugar} | => | {orange} | 0.0146520 | 1.0000000 | 0.0146520 | 23.73913 | 8 |
| [7] | {lemon juice,sugar} | => | {maraschino cherry} | 0.0146520 | 0.6153846 | 0.0238095 | 21.00000 | 8 |
| [8] | {light cream} | => | {creme de cacao} | 0.0109890 | 0.3750000 | 0.0293040 | 20.47500 | 6 |
| [9] | {creme de cacao} | => | {light cream} | 0.0109890 | 0.6000000 | 0.0183150 | 20.47500 | 6 |
| [10] | {carbonated water,lemon} | => | {powdered sugar} | 0.0128205 | 1.0000000 | 0.0128205 | 19.50000 | 7 |
Package arulesViz gives an extensive options for visual representations of association rules:
library(arulesViz)
library(htmlwidgets)
library(htmltools)
top_rules <- head(rules, n = 10, by = "lift")
p <- plot(rules, method = "graph", engine = "htmlwidget")
#Making it pretty with html tags and transparency taken care using manual css:
p <- prependContent(p, tags$h1("Relation between Ingredients of Cocktails"))
p <- prependContent(p, tags$p("Size corresponds to Support"))
p <- prependContent(p, tags$p("Color corresponds to Lift"))
p <- prependContent(p, tags$p("Scroll up/down to Zoom and Hover for details"))
p <- prependContent(p, tags$script('document.body.style.backgroundImage = "url(https://static.vecteezy.com/system/resources/previews/000/444/109/non_2x/vector-alcohol-cocktails-icons-black.jpg)"'))
p
Size corresponds to Support
Color corresponds to Lift
Scroll up/down to Zoom and Hover for details
Another representations of rules can be through Matrix visualisation of antecedent and consequent itemsets forming the columns and rows of matrix, respectively. Rules with the highest lift are placed in the top-right corner.
plot(rules, method = "matrix", engine = "htmlwidget")